Profilers

Tracy

  • Tracy .

  • Releases .

  • odin-tracy .

  • OpenGL, Vulkan, Direct3D 11/12, Metal, OpenCL, CUDA.

  • Direct support for C, C++, Lua, Python and Fortran. Bindings for Rust, Zig, C, OCaml, Odin, etc.

  • Windows, Linux, FreeBSD, Android, WSL, OSX, iOS, QNX.

  • Cool demo video .

Advantages of Tracy
  • You can profile CPU, GPU, locks, memory allocations, context switches, and more.

  • Statistical information about zones, trace comparisons, or inclusion of inline function frames in call stacks (even in statistics of sampled stacks) are features unique to Tracy.

  • Tracy uses low-level kernel APIs, or even raw assembly, where other profilers rely on layers of abstraction.

  • Tracy is multi-platform right from the very beginning. Both on the client and server-side. Other profilers tend to have Windows-specific graphical interfaces.

  • Tracy can handle millions of frames, zones, memory events, and so on, while other profilers tend to target very short captures.

  • Tracy provides a mapping of source code to the assembly, with detailed information about the cost of executing each instruction on the CPU.

  • .

Server and Client
  • In Tracy terminology, the profiled application is a client, and the profiler itself is a server. It was named this way because the client is a thin layer that just collects events and sends them for processing and long-term storage on the server. The fact that the server needs to connect to the client to begin the profiling session may be a bit confusing at first.

  • You may profile a game on a mobile phone over the wireless connection, with the profiler running on a desktop computer. Or you can run the client and server on the same machine, using a localhost connection. It is also possible to embed the visualization front-end in the profiled application, making the profiling self-contained.

Notes

  • I had problems with Tracy with ASan, leading to game  crashes sometimes. Keep that in mind.

Performance

Zones gaps
  • I was getting 200-470ns in zone gaps.

    • Check https://github.com/wolfpld/tracy/issues/1212

    • slomp (Tracy github contributor):
      "As for the "Tracy achieves such small overhead (only 2.25 ns)", well, there's a lot of nuance here (with rdtsc and such), but put simply, on a modern x64 machine, an "empty" zone should be between 10ns to 50ns."

    • Maybe this is related to the allocation made from the ___tracy_alloc_srcloc

      • Makes sense. Source location information in most Tracy Zone annotations is handled at compile-time and are stored in static, read-only data blocks in the program binary, so no dynamic allocation needs to happen.

  • The first zone takes longer than others.

    • This happened for me and 'slomp'.

    • wolfpld (Tracy author):

      • Queue block allocation cost is amortized.

  • While disabling the call stack, I got around 80-200ns between zones, sometimes down to 30ns. This concludes that the problem was indeed the call stack (I was using 2 before), and this could probably be further improved if not using the C API and avoiding the extra allocations. I check if it's possible to do something similar to the macro used, but in Odin, as the language doesn't offer much support for meta language :/

Start profiling

  • Profiling debugging builds makes little sense, as the unoptimized code and additional checks (asserts, etc.) completely change how the program behaves.

  • In the default configuration, Tracy is disabled. This way, you don’t have to worry that the production builds will collect profiling data. To enable profiling, you will probably want to create a separate build configuration, with the TRACY_ENABLE  define.

    • Make sure that this macro is defined for all files across your project (e.g. it should be specified in the CFLAGS  variable, which is always passed to the compiler, or in an equivalent way), and not as a #define  in just some of the source files.

    • Tracy does not consider the value of the definition, only the fact if the macro is defined or not.

    • Be careful not to make the mistake of assigning numeric values to Tracy defines, which could lead you to be puzzled why constructs such as TRACY_ENABLE=0  don’t work as you expect them to do.

  • In addition, you should enable usage of the native architecture of your CPU (e.g. -march=native) to leverage the expanded instruction sets, which may not be available in the default baseline target configuration.

  • On Unix, make sure that the application is linked with libraries libpthread  and libdl . BSD systems will also need to be linked with libexecinfo .

When the profiling starts
  • By default, Tracy will begin profiling even before the program enters the main function.

  • However, suppose you don’t want to perform a full capture of the application lifetime. In that case, you may define the TRACY_ON_DEMAND  macro, which will enable profiling only when there’s an established connection with the server.

Short-lived Apps
  • In case you want to profile a short-lived program (for example, a compression utility that finishes its work in one second), set the TRACY_NO_EXIT  environment variable to 1. With this option enabled, Tracy will not exit until an incoming connection is made, even if the application has already finished executing. If your platform doesn’t support an easy setup of environment variables, you may also add the TRACY_NO_EXIT  define to your build configuration, which has the same effect.

  • You should note that if on-demand profiling is disabled (which is the default), then the recorded events will be stored in the system memory until a server connection is made and the data can be uploaded.

Client connection
  • By default, the Tracy client will announce its presence to the local network12. If you want to disable this feature, define the TRACY_NO_BROADCAST  macro. The program name that is sent out in the broadcast messages can be customized by using the TracySetProgramName(name)  macro.

  • By default, the Tracy client will listen on all network interfaces. If you want to restrict it to only listening on the localhost interface, define the TRACY_ONLY_LOCALHOST  macro at compile-time, or set the TRACY_ONLY_LOCALHOST  environment variable to 1 at runtime.

  • If you need to use a specific Tracy client address, such as QNX requires, define the TRACY_CLIENT_ADDRESS  macro at compile-time as the desired string address.

  • By default, the Tracy client will listen on IPv6 interfaces, falling back to IPv4 only if IPv6 is unavailable. If you want to restrict it to only listening on IPv4 interfaces, define the TRACY_ONLY_IPV4  macro at compile-time, or set the TRACY_ONLY_IPV4  environment variable to 1 at runtime.

  • By default, the client and server communicate on the network using port 8086. The profiling session utilizes the TCP protocol, and the client sends presence announcement broadcasts over UDP.

  • Suppose for some reason you want to use another port16. In that case, you can change it using the TRACY_DATA_PORT  macro for the data connection and TRACY_BROADCAST_PORT  macro for client broadcasts. Alternatively, you may change both ports at the same time by declaring the TRACY_PORT  macro (specific macros listed before have higher priority). You may also change the data connection port without recompiling the client application by setting the TRACY_PORT  environment variable. If a custom port is not specified and the default listening port is already occupied, the profiler will automatically try to listen on a number of other ports.

    • To enable network communication, Tracy needs to open a listening port. Make sure it is not blocked by an overzealous firewall or anti-virus program.

  • "Run the profiled application (e.g. demo ) in privileged mode (sudo/administrator) to enable even more features in Tracy."

    • The author of odin-tracy said that.

Instrumenting the App

  • All the user-facing interface is contained in the public/tracy/Tracy.hpp  header file.

Naming Threads
  • tracy::SetThreadName(name)

    • Set thread names for proper identification of threads.

    • Tracy will try to capture thread names through operating system data if context switch capture is active. However, this is only a fallback mechanism, and it shouldn’t be relied upon.

    • public/common/TracySystem.hpp .

  • tracy::SetThreadNameWithHint(name, int32_t groupHint)

    • This hint is an arbitrary number that is used to group threads together in the profiler UI.

    • The default value and the value for the main thread is zero.

Zones
  • tracy.ZoneNC("worker doing stuff", 0xff0000);

Impressions

  • (2025-11-04)

    • Compilation:

      • If following the compilation steps is ok.

      • The total thing weighs 1.33GB, after compiling everything. Uses Visual Studio, C++, MSVC. So yeah, it's unpleasant.

Setup

  • Tracy Profiler supports MSVC, GCC, and clang.

  • You will need to use a reasonably recent version of the compiler due to the C++11 requirement.

  • All the files required to integrate your application with Tracy are contained in the public directory.

  • With the source code included in your project, add the public/TracyClient.cpp  source file to the IDE project or makefile. You’re done.

  • Tracy is now integrated into the application.

CMake
  • Tracy uses the CMake build system. Unlike in most other programs, the root-level CMakeLists.txt file is only used to provide client integration. The build definition files used to create profiler executables are stored in directories specific to each utility.

  • The CMakeLists.txt file only contains the general definition of how the program should be built. To be able to actually compile the program, you must first create a build directory that takes into account the specific compiler you have on your system, the set of available libraries, the build options you specify, and so on.

  • You can do this by issuing the following command, in this case for the profiler utility:

cmake -B profiler / build -S profiler - DCMAKE_BUILD_TYPE = Release
  • Now that you have a build directory, you can actually compile the program. For example, you could run the following command:

cmake -- build profiler / build -- config Release -- parallel
  • The build directory can be reused if you want to compile the program in the future, for example if there have been some updates to the source code, and usually does not need to be regenerated. Note that all build artifacts are contained in the build directory.

  • You can integrate Tracy with CMake by adding the git submodule folder as a subdirectory.

# set options before add_subdirectory
# available options : TRACY_ENABLE , TRACY_LTO , TRACY_ON_DEMAND , TRACY_NO_BROADCAST , TRACY_NO_CODE_TRANSFER , ...
option(TRACY_ENABLE " " ON)
option(TRACY_ON_DEMAND " " ON)
add_subdirectory(3 rdparty / tracy) # target : TracyClient or alias Tracy :: TracyClient
  • Link Tracy::TracyClient  to any target where you use Tracy for profiling:

target_link_libraries ( <TARGET> PUBLIC Tracy :: TracyClient )
  • With CMake 3.11+, you can use Tracy via CMake FetchContent. In this case, you do not need to add a git submodule for Tracy manually. Add this to your CMakeLists.txt :

FetchContent_Declare (
    tracy
    GIT_REPOSITORY https://github.com/wolfpld/tracy.git
    GIT_TAG        master
    GIT_SHALLOW    TRUE
    GIT_PROGRESS   TRUE
)
FetchContent_MakeAvailable(tracy)
  • Then add this to any target where you use tracy for profiling:

target_link_libraries(<TARGET> PUBLIC TracyClient)
Static Library
  • If you are compiling Tracy as a static library to link with your application, you may encounter some unexpected problems if not using any symbols by the library. To avoid this, you can simply add the TracyNoop  macro somewhere in your code, for example in the main function. The macro doesn’t do anything useful, but it inserts a reference that is satisfied by the static library, which results in the Tracy code being linked in and the profiler being able to work as intended.

Server steps
  • (2025-11-04) I did it this way.

    git clone --recurse-submodules https://github.com/oskarnp/odin-tracy
    
    • While in the odin-tracy  dir, with the x64 Native Tools Command Prompt for VS 20XX .

    cd tracy\vcpkg
    .\install_vcpkg_dependencies.bat
    
    • Add #include <chrono>  to tracy\server\TracyView.hpp

    cd tracy\profiler\build\win32
    msbuild Tracy.sln -t:Build -p:Configuration=Release
    
    • The server executable is in odin-tracy\tracy\profiler\build\win32\x64\Release .

Client steps
  • (2025-11-04) I did it this way.

  • While in odin-tracy  dir:

  • cl -MT -O2 -DTRACY_ENABLE -c tracy\public\TracyClient.cpp -Fotracy

    • cl  invokes MSVC.

    • -MT  links against the static multithreaded CRT.

    • -O2  enables full optimization.

    • -DTRACY_ENABLE  defines the preprocessor symbol so Tracy’s instrumentation is active.

    • -c  compiles only and produces an object file.

    • tracy\public\TracyClient.cpp  is the source.

    • -Fotracy  sets the output object file name to tracy.obj .

  • lib tracy.obj

    • lib  creates a static library.

    • tracy.obj  is the input.

    • The default output is tracy.lib  unless another name is passed.

  • Effect: compile the Tracy client into an object using static CRT and optimizations, then pack it into a static library.

  • This will create the files:

    • odin-tracy/tracy.obj

      • Only used to generate the tracy.lib

    • odin-tracy/tracy.lib

      • Used by the odin-tracy binding.

  • REMEMBER TO :

    • Set -define:TRACY_ENABLE=true

Pre-built binaries
  • The version releases of the profiler are provided as precompiled Windows binaries for download at https://github.com/wolfpld/tracy/releases, along with the user manual.

  • You will need to install the latest Visual C++ redistributable package to use them.

  • Note that these binary releases require AVX2 instruction set support on the processor. If you have an older CPU, you will need to set a proper instruction set architecture in the project properties and build the executables yourself.

Spall

  • Spall-web .

  • About .

  • The Jobs  repo uses Spall in the examples:

    • Boids.

      • I prefer this one, because it's visual with RayLib.

    • Background.

    • Simple.

  • A .spall  file is generated and used on the site.

  • For a non-web version, you have to buy Spall  for $100 dollars.

  • Impressions :

    • I don't like it being web-based.

    • It's solid, cool, made in Odin, nice.

    • Generates local files, very simple to understand.

    • Officially supported by Odin.

Nvidia Nsight Graphics - GPU Trace

Nsight Graphics

  • GPU Trace .

  • Supports a limited number of frames, e.g. up to ~1–15 frames depending on options). Useful for small multi-frame captures where you already know the target time window.

  • (2025-10-04)

    • I had to run as an admin to use it.

    • .

AMD GPU Profiler (AMD RGP)

  • https://gpuopen.com/rgp/

  • RGP historically is single-frame focused, though driver/driver tools have timeline features for profiling

Intel GPA

  • .

Nsight Systems

  • https://developer.nvidia.com/nsight-systems

  • (2025-10-04)

    • Alt + Scroll:

      • Scrolls horizontally.

      • I didn't like that.

    • Ctrl + Scroll:

      • Zoom.

    • .

    • I thought the information was pretty "bad" and not specific to Vulkan.

    • I recorded 1m18s of gameplay, taking a few mins to generate the profile.